长期负载请求继续限制高性能处理器的性能。为了提高处理器的潜伏能力,建筑师主要依赖两种关键技术:复杂的数据预脱水和较大的芯片固定缓存。在这项工作中,我们表明:1)即使是先进的先进预摘要,也只能预测一半的外芯片负载请求,平均在广泛的工作负载中,而2)由于尺寸的增加,并且片上缓存的复杂性,花片载荷请求的延迟的很大一部分用于访问片上缓存层次结构。这项工作的目的是通过从其关键路径上删除片上缓存访问延迟来加速片外负载请求。为此,我们提出了一种称为爱马仕(Hermes)的新技术,其关键想法是:1)准确预测哪些负载请求可能会偏离芯片,2)猜测预测的芯片外载荷直接从主芯片负载所需的数据内存,同时也同时访问此类负载的高速缓存层次结构。为了启用爱马仕,我们开发了一种新的轻巧,基于智障的外芯片加载预测技术,该技术学会使用多个程序功能(例如,程序计数器的序列)来识别芯片外负载请求。对于每个负载请求,预测器都会观察一组程序功能,以预测负载是否会外芯片。如果预计负载将放置芯片,Hermes一旦生成负载的物理地址,就会直接向内存控制器发出投机请求。如果预测是正确的,则负载最终会错过缓存层次结构,并等待正在进行的投机请求完成,从而将芯片上缓存层次结构访问延迟隐藏在离芯片外负载的关键路径中。我们的评估表明,爱马仕显着提高了最先进的基线的性能。我们开源爱马仕。
translated by 谷歌翻译
过去的研究提出了许多硬件预取技术,其中大多数依赖于利用一种特定类型的程序上下文信息(例如,程序计数器,Cacheline地址)来预测未来的存储器访问。这些技术完全忽略了整个系统上的预取器的不良影响(例如,内存带宽使用),或将系统级反馈结合为返回为系统 - 不知预取算法。我们表明,由于其固有的无法在预取帐户中占用多种不同类型的程序上下文和系统级反馈信息,因此在广泛的工作负载和系统配置中往往会在广泛的工作负载和系统配置中丢失其性能效益。在本文中,我们进行了设计一个整体预取算法的案例,该算法学习使用多种不同类型的程序上下文和系统级反馈信息来预取。为此,我们提出了Pythia,它将预取器制定为钢筋学习代理。对于每种需求请求,Pythia会观察多种不同类型的程序上下文信息以进行预取决定。对于每个预取决定,Pythia接收数字奖励,该奖励评估当前内存带宽使用情况下的预取质量。 Pythia使用此奖励来加强程序上下文信息和预取决定之间的相关性,以在将来生成高度准确,及时和系统感知的预取请求。我们使用仿真和硬件综合的广泛评估表明,Pythia在各种工作负载和系统配置中优于多种最先进的预取器,同时在桌面类处理器中产生的1.03%的面积开销,并且工作负载中没有软件更改。 Pythia的源代码可以从https://github.com/cmu-safari/pythia自由下载。
translated by 谷歌翻译
Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
translated by 谷歌翻译
Radiance Fields (RF) are popular to represent casually-captured scenes for new view generation and have been used for applications beyond it. Understanding and manipulating scenes represented as RFs have to naturally follow to facilitate mixed reality on personal spaces. Semantic segmentation of objects in the 3D scene is an important step for that. Prior segmentation efforts using feature distillation show promise but don't scale to complex objects with diverse appearance. We present a framework to interactively segment objects with fine structure. Nearest neighbor feature matching identifies high-confidence regions of the objects using distilled features. Bilateral filtering in a joint spatio-semantic space grows the region to recover accurate segmentation. We show state-of-the-art results of segmenting objects from RFs and compositing them to another scene, changing appearance, etc., moving closer to rich scene manipulation and understanding. Project Page: https://rahul-goel.github.io/isrf/
translated by 谷歌翻译
Reduced system dependability and higher maintenance costs may be the consequence of poor electric power quality, which can disturb normal equipment performance, speed up aging, and even cause outright failures. This study implements and tests a prototype of an Online Sequential Extreme Learning Machine (OS-ELM) classifier based on wavelets for detecting power quality problems under transient conditions. In order to create the classifier, the OSELM-network model and the discrete wavelet transform (DWT) method are combined. First, discrete wavelet transform (DWT) multi-resolution analysis (MRA) was used to extract characteristics of the distorted signal at various resolutions. The OSELM then sorts the retrieved data by transient duration and energy features to determine the kind of disturbance. The suggested approach requires less memory space and processing time since it can minimize a large quantity of the distorted signal's characteristics without changing the signal's original quality. Several types of transient events were used to demonstrate the classifier's ability to detect and categorize various types of power disturbances, including sags, swells, momentary interruptions, oscillatory transients, harmonics, notches, spikes, flickers, sag swell, sag mi, sag harm, swell trans, sag spike, and swell spike.
translated by 谷歌翻译
Arbitrary Style Transfer is a technique used to produce a new image from two images: a content image, and a style image. The newly produced image is unseen and is generated from the algorithm itself. Balancing the structure and style components has been the major challenge that other state-of-the-art algorithms have tried to solve. Despite all the efforts, it's still a major challenge to apply the artistic style that was originally created on top of the structure of the content image while maintaining consistency. In this work, we solved these problems by using a Deep Learning approach using Convolutional Neural Networks. Our implementation will first extract foreground from the background using the pre-trained Detectron 2 model from the content image, and then apply the Arbitrary Style Transfer technique that is used in SANet. Once we have the two styled images, we will stitch the two chunks of images after the process of style transfer for the complete end piece.
translated by 谷歌翻译
Machine learning (ML) algorithms are remarkably good at approximating complex non-linear relationships. Most ML training processes, however, are designed to deliver ML tools with good average performance, but do not offer any guarantees about their worst-case estimation error. For safety-critical systems such as power systems, this places a major barrier for their adoption. So far, approaches could determine the worst-case violations of only trained ML algorithms. To the best of our knowledge, this is the first paper to introduce a neural network training procedure designed to achieve both a good average performance and minimum worst-case violations. Using the Optimal Power Flow (OPF) problem as a guiding application, our approach (i) introduces a framework that reduces the worst-case generation constraint violations during training, incorporating them as a differentiable optimization layer; and (ii) presents a neural network sequential learning architecture to significantly accelerate it. We demonstrate the proposed architecture on four different test systems ranging from 39 buses to 162 buses, for both AC-OPF and DC-OPF applications.
translated by 谷歌翻译
Temporal action segmentation tags action labels for every frame in an input untrimmed video containing multiple actions in a sequence. For the task of temporal action segmentation, we propose an encoder-decoder-style architecture named C2F-TCN featuring a "coarse-to-fine" ensemble of decoder outputs. The C2F-TCN framework is enhanced with a novel model agnostic temporal feature augmentation strategy formed by the computationally inexpensive strategy of the stochastic max-pooling of segments. It produces more accurate and well-calibrated supervised results on three benchmark action segmentation datasets. We show that the architecture is flexible for both supervised and representation learning. In line with this, we present a novel unsupervised way to learn frame-wise representation from C2F-TCN. Our unsupervised learning approach hinges on the clustering capabilities of the input features and the formation of multi-resolution features from the decoder's implicit structure. Further, we provide the first semi-supervised temporal action segmentation results by merging representation learning with conventional supervised learning. Our semi-supervised learning scheme, called ``Iterative-Contrastive-Classify (ICC)'', progressively improves in performance with more labeled data. The ICC semi-supervised learning in C2F-TCN, with 40% labeled videos, performs similar to fully supervised counterparts.
translated by 谷歌翻译